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Abstract 

It is known that random /c-CNF formulas have a so-called satisfiability threshold at a density 
(namely, clause- variable ratio) of roughly 2*^ In 2: at densities slightly below this threshold almost 
all /c-CNF formulas are satisfiable whereas slightly above this threshold almost no fc-CNF formula 
is satisfiable. In the current work we consider satisfiable random formulas, and inspect another 
parameter - the diameter of the solution space (that is the maximal Hamming distance between 
a pair of satisfying assignments). It was previously shown that for all densities up to a density 
slightly below the satisfiability threshold the diameter is almost surely at least roughly n/2 
(and n at much lower densities) . At densities very much higher than the satisfiability threshold, 
the diameter is almost surely zero (a very dense satisfiable formula is expected to have only 
one satisfying assignment). In this paper we show that for all densities above a density that is 
slightly above the satisfiability threshold (more precisely at ratio (1 -|-e)2'^ ln2, e = e{k) tending 
to as fc grows) the diameter is almost surely 0{k2~^'n). This shows that a relatively small 
change in the density around the satisfiability threshold (a multiplicative (1 + e) factor), makes 
a dramatic change in the diameter. This drop in the diameter cannot be attributed to the fact 
that a larger fraction of the formulas is not satisfiable (and hence have diameter 0), because the 
non-satisfiable formulas are excluded from consideration by our conditioning that the formula 
is satisfiable. 
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1 Introduction 



The computational complexity of Boolean formula satisfiability has been the focus of intensive 
research for decades. Recently, a promising approach to understanding the algorithmic difficulty 
of A;-SAT has emerged, in the form of rigorous analysis of the structural properties of formulas 
drawn at random from certain distributions. For example, a natural distribution which has been 
studied extensively is the uniform distribution over A;-CNF formulas with exactly m clauses over n 
variables. We denote this distribution by J^n,m,k- Despite its simple description, many fundamental 
properties of this model are yet to be understood. For example, the computational complexity of 
deciding if a random formula is satisfiable and of finding a satisfying assignment are both major 
open problems [T5l [22]. 

The clause to variable ratio m/n of a formula is referred to as the density of the formula. 
The random model J-n,m,k exhibits a "phase transition" in satisfiability, where sparse formulas are 
likely to be satisfiable whereas dense formulas are unlikely to be satisfiable. Moreover, this phase 
transition happens at a very short density interval. There exists a satisfiability threshold = dk{n) 
such that fc-CNF formulas with density m/n > dk are not satisfiable w/ijo, while formulas with 
m/n < dk are satisfiable whp [18]. A first-moment-method calculation provides an upper-bound of 
dk ^ 2*^ In 2, and the threshold is conjectured to be within a constant distance of this upper-bound 
(for all values of k). A lower-bound of 2*^ In 2 — 0{k) was established rigorously using a weighted 
second-moment-method in [3]. 

For a satisfiable fe-CNF formula F, let Tmaxl-^) be the maximal Hamming distance between a 
pair of satisfying assignments of F. In this paper we study the behavior of rmax(-^) as a function 
of the density. Specifically, we will consider random satisfiable formulas, and ask what the typical 
value of Tmax is likely to be at various densities. Observe that as one adds more clauses to a formula, 
the set of satisfying assignments can only decrease, and hence also rmax can only decrease. This 
indicates that the typical value of rmax should decrease as the density increases. However, when the 
formula becomes unsatisfiable, the formula is discarded from consideration. Since the formulas of 
lowest diameter (diameter 0) are those discarded from consideration, and their proportion increases 
as the density increases, this may conceivably lead to a situation in which as the density increases 
the expected diameter increases rather than decreases. In particular, there does not seem to be an 
a-priori reason why the threshold for satisfiability should correspond to a threshold behavior also 
with respect to the diameter of satisfiable formulas. 

Let us review what is known about rmax(-^) at densities below the satisfiability threshold. 
For m/n < 2'^~"'^ln2 we know that all but o(l)-fraction of the formulas satisfy ?"niax(-^) = n 
(this is because they are satisfied as NAE-/c-SAT instances [2]). The results in [4j imply that for 
m/n = (1 - 6)2'' ln2, 5 e (0, 1/3), for all but o(l)-fraction of satisfiable /c-CNF formulas rmax(i^) 

is at least (^ — |)n (this is true for k > k^, ko = kQ{5)). This large diameter is due to 

the existence of many small clusters of satisfying assignments, which are "spread" in the space 
of all 2" possible assignments. Physicists conjecture that this picture persists up to the so-called 
condensation point at 2'^ln2 — c^, for some constant c^, at which point the number of remaining 
clusters drops to polynomial and then maybe to constant. True or not, this conjecture does not 
imply that rmax(-^) becomes small, because it can remain of value roughly n/2 even when there are 
only two clusters. For densities much higher than the satisfiability threshold (by a factor of roughly 
logn), the typical value of rjnax(-^) is 0, because such formulas, if satisfiable, are likely to have only 
one satisfying assignment (see for example [8j for the case of 3-CNF). This shows that the diameter 

^We say a sequence of events holds with high probability (whp) to mean with probabihty tending to 1 as n tends 
to infinity. 
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of random satisfiable formulas does undergo a phase transition as the density increases (starting 
at n, and eventuahy reaching 1), but it is not clear whether there is any density that serves as a 
threshold around which there is a sharp drop in diameter. 
In this paper we show that: 

Theorem 1. For all k > 20 and m/n > (1 + 0.99*^)2'^ In 2, all but a o{l)-fraction of satisfiable 
k- CNF formulas F with m clauses over n variables satisfy 

rmax(i^) < 50k2~^n. 

Our result proves that there occurs a transition from a typical structure of satisfying assign- 
ments which are wide-spread in the n-dimensional binary cube, to a structure where all satisfying 
assignments are typically contained in a ball of small diameter. The window in which this phase 
transition occurs is contained in [(1 — ei)2^1n2, (1 -|- £2)2^ In 2], where both £1,62 tend to as /c 
grows. 

Here are a few interesting observations regarding this phase transition. 

1. The threshold phenomenon in Tj-Qax occurs at a window of densities that lies around 2^ In 2, 
and whose width is a low-order term w.r.t. 2^. Since we are considering only satisfiable 
/c-CNF formulas (below or above the threshold), there is no a-prior reason for this threshold 
to be found in the vicinity of the satisfiability threshold (as the latter is irrelevant for such 
formulas). Still, as our result shows, this is the case. 

2. Since we are looking at satisfiable formulas, this is not a product distribution. Therefore some 
methods for establishing threshold behaviors (such as [18j) are not applicable. 

3. Consider the property of having a diameter of at least r. This is not necessarily a monotone 
property of the density (at least we are not aware of an easy proof that it is). Again, this 
shows that some approaches to prove the existence of such threshold (such as [18j) may not 
be applicable. 

4. Typically rmax = it- for m/n < 2^^ In 2/2. This is because at those ratios most formulas are 
satisfiable as NAE-fc-SAT formulas [2] (in which case for every satisfying assignment in the 
NAE manner, also its complement at distance n is satisfying). Numerical calculations using 
tools from statistical physics predict that at 2^ In k/k there is a phase transition from a typical 
structure of a big connected ball of satisfying assignments into many small balls of satisfying 
assignments (which are called clusters). Observe that 2'^hik/k < 2^ In 2/2 for all k > 3, 
therefore while there is a major change in the structure of the solution space, rmax is not 
affected. 

Let us briefly discuss what happens for k < 20. Our approach assumes that (2 • 0.99)'^ is a 
low-order term compared with 2^. This is however not true (or not relevant) when k is small. Also, 
the fact that we have a constant like 50 in the bound on Tj^ax makes the result trivial for small 
values of k. On the other hand, for fixed k (say A; = 3) one can numerically estimate the value of 
'"max (via the same methods used in the proof of Theorem [H just figuring out the exact numerics 
instead of a rigorous, less tight, estimation that we perform). For example, for A; = 3 the numerics 
show that typically r^ax < 0.2n for density m/n = 7.625 (which is ~ 1.375 • 2*^ ln2 for k = 3). 

Questions regarding the structure of the solution space guided the development of algorithms 
in similar contexts in the past (two such examples are algorithms that were developed for 3CNF 
formulas with a planted solution, and the intuition that served the development of the Survey 
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Propagation algorithm). In this paper we hmit our study to some structural properties of the 
solution space and do not address algorithmic aspects, though hopefully our new insights can serve 
the algorithmic perspective at some point as well. 

More precisely, while the algorithmic and structural understanding of below-threshold random 
formulas and above-threshold (for sufficiently large, yet constant, density) is rather thorough (a 
short list for the below threshold regime could be [TOl [III [II [231 il [I] and [111 [El EZl [H [7] for the 
above threshold), there is no rigorous algorithmic result for clause- variable ratio c > 2'^ln2 when 
c is some constant above the satisfiability threshold, but not "sufficiently large". (For the special 
case of k = 3 there are some experimental results in [7j.) 

1.1 Techniques 

One reasonable approach to prove Theorem [His to consider the uniform distribution over sat- 
isfiable /c-CNF formulas with m clauses over n variables, and study rniax(-^) of a random instance 
in that distribution. Throughout Un^rn,k denotes the uniform distribution. More specifically, we 
consider a random formula F from Un,m,k and estimate the expected number of pairs of satisfying 
assignments at distance xn from each other. A similar approach was used for example in |3l[23l[l] 
for random formulas in the below-threshold regime. 

The major additional challenge that we face in this present work is the fact that the uniform 
distribution Un,m,k is not a product space, clause appearances are dependent, and it is unclear how 
to quantify this dependence. On the other hand, in the below-threshold regime, since whp a random 
A;-CNF formula is satisfiable, one can study random /c-CNF formulas instead of satisfiable ones. This 
distribution, which we denoted above by J^n,m,k, is very "close" to a product space (compare with the 
distribution where every clause is chosen independently at random with probability p = m/ (2^ (^) ) , 
which is already a product space). 

One demonstration of this technical challenge is the difficulty of answering the following ques- 
tion: given a fixed assignment ip, what is the probability that it satisfies a random F7 If F is drawn 
from J-n,m,k then the answer is simple, Pr[ip |= F] = {1 — 2"'^)™'. If F is drawn from Un^m,k then 
giving an explicit expression (as a function of m,n,k) for |= F] is still an open question. 

We will show that for x > 50A;2~'^ the expected number of pairs of satisfying assignments at 
distance xn from each other is much smaller than 1/n. Since there are at most n possible ways 
to choose X, we can use the union bound to prove that whp F has the desired properties (since 
^^n,m,k is the uniform distribution, showing that the property holds whp translates immediately to 
a deterministic statement about all but a vanishing fraction of satisfiable formulas). 

To derive our estimate on the expected number of pairs of satisfying assignments at distance 
xn we first analyze a different distribution which is commonly called the planted distribution, 
and we shall denote it by Vn,m,k- To generate a formula according to T'n.m,k, fix an assignment 
uniformly at random, then includes m clauses uniformly at random out of (2^^ — l) (^) clauses that 
are consistent with the "planted" assignment. 

When working with Vn,m,k^ the clauses are nearly independent and calculation is much easier. 
We then relate the planted model and the uniform model to obtain the desired result. The idea 
of translating bounds from the planted to the uniform model was used in [H [H [23] for the below- 
threshold regime, and also in \\.2\ [T3] but in a different context. 

The reader may wonder at this point what happens when m/n < (l + 0.99'^)2'^ In 2? Do typically 
all satisfying assignments lie in a low-diameter Hamming ball all the way down to the satisfiability 
threshold (or even below it)? Numerical and rigorous (tedious) calculations that we did, whose 
details we omit here, suggest that Theorem [H can be extended (maybe with some changes in the 
upper bound on r^iax) down to tti/ti — 2^ In 2 -|- 0(/c) (which, is an 0(A^)-a.d.(iitivG term from the 



4 



satisfiability threshold). This extension is done using the same technique of going through the 
planted distribution. However, when m/n = 2'^ln2 + 0{k) this technique breaks. In Section[5]we 
discuss this issue and suggest another technique that may prove useful when our first technique 
fails. This discussion is part of a more general discussion about the width of the window in which 
the phase transition in the values of rmax occurs. 



2 Relating the uniform and the planted distributions 

Let Ux be a random variable counting the number of pairs of satisfying assignments at distance 
xn from each other that a random formula in lAn,m,k has. Let T to denote the expected number 
of satisfying assignments that a random formula in Un,m,k bas (that is T = Ylx ^i'^^])^ ^^'^ 
a random variables which denotes the number of satisfying assignments at distance xn from the 
planted assignment, had F belonged to Vn,m,k- The following proposition allows us to upper bound 
^^[n^.] via the more accessible quantity E[fx]- 

Proposition 2. Let F be a random formula sampled according to hln,m,k, then 

E[ux]=T-E[U]/2. 

(A similar approach of relating the uniform and the planted distribution can be found in ^23j . 
though in that case the uniform distribution was the non-conditioned one). 

Proof. For two satisfying assignments (pij^^j we use 6{ipi,ipj) to denote their Hamming distance. 
Consider some ordering on the 2" possible assignments, and let Ai be an indicator variable which 
is 1 if (fi satisfies F. Using this terminology, 

i,j:S{ipi,ipj)=xn 

Linearity of expectation gives 

E[ux] = l ^ Pr[AAA,] = ^ J2 Pr[AU,]Pr[A,]. 

i,j:S(ipi,ipj)=xn 5(ipi,ipj)=xn 

By symmetry, the latter equals 

i:S{ipi,ipj)=xn 

It remains to estimate Pr[^j|^j]. Conditioning on the event Aj means conditioning on the fixed 
assignment ifj to be satisfying. In turn, Un^m,k conditioned on (pj being a satisfying assignment 
means that only clauses which are satisfied by (pj can be included, and by symmetry, every set of 
t clauses satisfied by ipj has the same probability of being included. Observe that for t = m this 
is exactly the definition of the planted distribution Vn,m,k- Therefore Pf[-^i\-^j\ = E[fx]-, when 
summing over all assignments at distance xn from ipj. Furthermore, T = Pr\Aj\ (now we 
are summing over all 2" assignments), and hence Pr[j4j] = r/2". Putting everything together we 
derive 

E[ux]=T-E[fx]/2. 
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In [23] this sort of proposition was already enough to estimate E[ux] since T can be easily 
calculated when m/n is below the satisfiability threshold. However in lAn^m,k^ fn/n above the 
satisfiability threshold, it is not clear how to calculate T. The following lemma is then useful (the 
proof can also be found in [Ijj, and is given here for completeness). 

Lemma 3. Let W he the expected number of satisfying assignments of a random Vn,m,k instance. 
Then always T <W . 

Proof. Let ti be the number of formulas on n variables and m clauses which have exactly i satisfying 
assignments. Let pi be the probability that a formula with exactly i satisfying assignments is 
sampled from hln,m,k-, and let qi be defined similarly for Vn,m,k- Observe that due to symmetry, 
sampling a formula from Vn,m,k is exactly equivalent to sampling a pair (99, F) uniformly at random 
from all pairs such that ip is an assignment and F is a formula satisfied by if. Hence: 

ti I' ' ti 

and 

_ Ei=i ^ • ti 

■ , • 

l^i=l * • ti 




This is just Cauchy-Schwartz, a,, • < a?) • 6^), with aj = ^/Ti and bi = i ■ \fti. 



3 The Planted Setting 

In this section we analyze W and E\f^. Recall that we use W to denote the expected number of 
satisfying assignments that a random formula in Vn^m^k has, and fx counts the number of satisfying 
assignments at distance xn from the planted assignment, had F belonged to Vn,m,k- 

Our analysis of E[fx\ is composed of two regimes. The first is the case x G [0, 1/k]. In this 
regime we know that E[fx] changes from uj{l) to o(l). This phenomenon is depicted in Figure 
[TJ The y-axis in the plot is fix) such that E[fx] = e^*(^)", the X-axis is the Hamming distance 
from the planted. Therefore the transition from E[fx] = uj{1) to E[fx] = o(l) corresponds to f*{x) 
changing from positive to negative. 

To translate our results to the uniform setting, it turns out that we need to have a more precise 
control on the rate in which E[fx\ decreases once changing to o(l). Therefore the analysis of that 
regime is more careful (Proposition [6|). Then we analyze the case x G [1/A;, 1]. In this regime, for 
a suitable choice of e (recall m/n = (1 + e)2'' In 2), E[fx\ is constantly o(l) (in fact, exponentially 
small in n). Therefore a more crude analysis will suffice (Proposition [5]). This corresponds in Figure 
[U to the fact that the curve is bounded away below the x-axis in that range. 
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Figure 1: Plot of f*{x) for /c = 6 and e = 2 



In this section we consider a slight modification of Vn,m,k- Instead of choosing m clauses u.a.r., 
we choose m clauses with repetitions. However, for m/n = C(l), the expected number of pairs 
of identical clauses in F (in the modified model) is 0{m? /n^). Thus, for A; > 3 this quantity is 
o(l). Therefore, as standard calculations show, every property that holds with probability q in the 
modified model holds with probability q{l + 0{\)) in Vn,m,k- Somewhat abusing notation, we will 
denote the modification also by Vn,m.k- 

Let us start with formulating E[fx] in a way which is convenient to work with. 



Lemma 4. 



E[fx] < 



n 

xn 



1 



1-(1 



2^-1 



Proof. Fix an assignment ip at distance xn from the planted assignment ip. The probability that 
tp also satisfies F can be calculated in the following manner. Let A be the set of variables on which 
both tj: and ip agree. |^| = (1 — x)n. Consider a random clause C satisfied by if all k variables in 
that clause fall in A, then C is surely satisfied by ■0- The probability for that is q = (^^"^f'*'^) / iX)- If 
at least one variable falls out of A, which happens with probability 1 — q, then the clause is satisfied 
only with probability ^^"^ . This is because there is one way to complement the variables which is 
not consistent with but is consistent with ip. There are (J^) ways to fix t/;, and therefore 



n 
xn 



1 + (1 



2'' -I 



n 

xn 



2'' -2 + q 
2^ - 1 



n 

xn 



2^ - 1 



Finally, observing that ^ < (1 



proves the lemma. 



It will be more convenient to work with the following quantity: 



n 



(3.1) 
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One can verify that 

fix) < H{x) In 2 + cln (^1 - 1^1^^^^ , (3.2) 
where H(x) denotes the binary entropy measure, 

H{x) = -(1 - x)log2(l - x) - xlog2X, 

and c = m/n = (1 + e)2'^ ln2. 

To make use of Proposition [2] we need to obtain tight bounds on W and E[fx]. In terms of 
E[f^] = e^*(^)", therefore to prove E[fx] = o(l) it suffices to prove f*{x) < 0. This is exactly 
what the fohowing two propositions formahy estabhsh. 

Proposition 5. For any k > 20, e > 0.99'^ and x G 1], 

f*{x) < -50k2'^ 

Proof. Throughout, we use the fohowing useful upper bound on ln(l — x). 

ln(l — x) < —X. 

We break the interval into two subintervals. Let us first consider x G [0.3,1]. Always 

H{x) ln2 < In 2, and on the other hand, using log(l — x) < — x. 

Therefore it suffices to prove that (1 + e)(l - (1 — x)^) > 1 + (50^2^'^/ In 2) for every x G [0.3, 1]. 
Indeed, 

(1 - (1 - x)^) > (1 - O.T'^), (1 + e) > (1 + 0.99'=). 

One can verify that for k > 20, multiplying these two quantities is always greater than 1 + 
(50A:2-'=/ln 2). 

Let us now move the the case x G [1/A;,0.3]. H(x) is monotonically increasing until x = 0.5, 
therefore it takes its maximal value in this interval at x = 0.3, which gives H{0.3) < 0.266. On 
the other hand (1 — (1 — x)'=) takes its minimal value at 1/k. Observe that (1 — 1/k)^ < e~^, and 
therefore 

(1 - (1 - x)'') > 1 - 1/e > 0.6 > 0.266 > H{0.3). 
In this case we have f*{x) < 0.266 - 0.6 < -0.3 < 50A;2~'' for every A; > 20. ■ 

Proposition 6. For any k > 20, e > and X e [20,2^ /k], if x = \2-^ then f*{x) < -Xl'^. 
Proof. For any x, we have 

ln(l — x) < —X, 

and, for < x < 1, 

1,2 2 

l-{l-xf>kx-^. 
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Thus, 



H{x)ln2 + cln (^l - L^—^ 

( \ — {\ — x)^ 
= -xlnx-(l-x)ln(l-x) + (l + e)2''(ln2)ln 1 ^ ^ 



2'' - 1 

(1 — (1 — x)^ 
T. 

< —xmx + X — [1 + £){ln 2) Ikx — I . 

Substituting A2~'^ for x, this upper-bound becomes 

— X In X + X — (1 + e) (In 2) ( A;x — j 

= X2-^ (A;(ln2) - InA) + A2-'= - (1 + e)(ln2) (^A;A2-^ - k'^ X'^2-^^'^^ 
= -(A In A)2-'= + A2-'^ - e(ln 2) (A:A2-'^ - k"^ X^2-^''-^'^ + (In 2)k^ X^2-^''-^ 
= -A2'^' (^(In A) - 1 + e(ln 2) (^k - k'^X2-''-^^ - (In 2)A;2A2-^-i) 

= -A2-'= (^(InA) (^1 - (In 2)^2^.2-^^-1^ _ l + (^e{\n2) (^k - eX2-^-^^^^ . 
Observe that A < 2'^/k and thus, 

k-k^X2-^-^ > 0, 

and since e > it suffices to prove that 

(InA) (^1 - (In 2)^2^.2-'=-!^ - 1 > 1. 
Since A < 2''/k, and k > 5, we have 

(,„2).^ < (1„2),._|Z^2--. = (l„2)^^^j-^^^^j-^ < 0,65, 

and so it suffices to verify that 

InA > 2/(1 - 0.65), 
which is always true for A G [20, 2'' /A;] (for k > 20, 2''/k > 20). 

4 Proof of Theorem [1] 

Recall Proposition [2] and Lemma [3] which establish together 

E[u,] < W • E[U]/2. 

W is the expected number of satisfying assignment is the planted model, W = E[fx]. 
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The idea of the proof is to use Propositions [5] and [6] to upper bound W by looking at the largest 
X s.t. E[fx] contributes to W (that is, E[fx] is not vanishing with n). We shall use xq to denote 
this number (regardless, observe that xq is an upper bound on the diameter of the cluster region in 
the planted setting). Then, to beat W, we take xi > xq, so that for every x > xi, E[fx] •W<^1. 
Respectively, xi uppers bounds the diameter of the cluster region in the uniform setting. It turns 
out that xi/xq = 0{k), and since xq scales down with 2~^, this additional factor is manageable. 

Formally, propositions O and [U] assert that only a; < 20 • may contribute to the value of W. 
Indeed, take xq = 20 ■ 2~^, then E[fx] = o{n~^) for every x > xq. For x < xq, the total number of 
possible assignments (which obviously bounds the expected number of satisfying assignments) at 
distance xn from the planted is 



xnj \xnJ 

This quantity is maximized for x < rco at rco, which gives g^'^'^^^^^)^ Therefore, for sufficiently 
large n. 



^ ^ {fcln2+l)20-2-*n ^ 40fc2-'=r 



X<Xo 

Now take xi = 50^2"*^ (for k > 20, 50A; < 2*^/A;, which is the maximal A allowed), applying 
Propositions [5] and [6] once more gives that for x > xi. 

In turn, for x > xi 

E[U,] < W ■ E[U\/2 < e4°^2-'=n . g-50fc2-'=n ^ g-10fc2-'=n_ 

Using Markov's inequality, for x > xi, 

Pr[ux > 0] < e-io'=2-'=n_ 

Applying the union bound, 

Pr[3x > 50A;2-^ -u^ > 0] < n • g-io'^^-'^n ^ 



5 Moving even closer to the threshold 

In the previous sections we showed that when m/n > (1 + 0.99^^)2*^ In 2, for k > 20, whp there are 
no pairs of satisfying assignments at distance greater than f)Qk2~^ from each other (Theorem [1]). 
Our approach was to consider the planted distribution and estimate E[fx\ - the expected number 
of satisfying assignments at distance xn from the planted assignment. Then we used Proposition [2] 
to relate this quantity to -E'[ttx] - the expected number pairs of satisfying assignments at distance 
xn from each other (in the uniform setting). The relation we established was given (in Proposition 
ED by 

E[ux]<W -ElU]. 

W is the expected number of satisfying assignments in Vn,m,k- 

Observe that W is always at least 1, and therefore using this relation to show that i?[n2] = o(l) 
makes sense only when E[fx\ = o[l). However, using (rather tedious) calculations one can show 



10 




0.0 0.1 0.2 0.3 0.4 0.5 0.6 

X 



Figure 2: Plot of f*{x) for A; = 6 and e = — 2 

that when m/n = 2^ ln2 + 0{k) there exists x E [0.5 — 0(2^*^), 0.5] such that E[fx] is exponentiahy 
large in n (details omitted). This phenomenon is depicted in Figure [2]. Therefore from this density 
downwards our method breaks (observe that E[fx] is monotonically decreasing and continuous in 
m/n). This phenomenon is demonstrated in Figures [2] vs. [H 

Compare the plots in those figures. Both depict on the y-axis /*(x) = and the distance 

from the planted assignment on the x-axis. To generate the plots we used the estimate on E[fx] 
given in Lemma [H Although Lemma H] establishes an upper bound on E[fx], in fact for x bounded 
away from equality holds (up to a o(l) additive factor inside the parenthesis). Since E[fx] is 
monotonically decreasing in m/n and continuous, as m/n gets smaller, the "hunchback" around 
X = 1/2 gets closer to the x-axis, and at some ratio crosses it to become positive. This ratio occurs 
at m/n = 2'=ln2 + 0{k). As k grows, the hunchback (regardless if above or below the x-axis) 
becomes narrower, and in general is concentrated in an interval of width 0(2"^^) around 1/2, with 
the maximum occurring at 1/2 — 0(2"^^). We have validated these claims using a combination of 
numerical and rigorous calculations (details omitted here). 

In this section we suggest a new technique which refines the one we used. Using our refined 
technique we can prove for example that at some settings, even though E[fx] is exponential in n 
(which means that our original technique fails), in fact whp fx = 0. Hopefully this refinement can 
benefit the uniform distribution as well. We do not discuss this point in the present paper. 

The key to the refinement is to replace fx with another quantity which counts maximal satisfying 
assignments at distance xn from the planted assignment - f^^^. This notion is similar to the notion 
of minimal satisfying assignments used in [20| . 

To demonstrate the power of this new technique we describe a setting where E[fx] > 1 (which 
means that our original technique fails) for some x G [0.3,0.6], but E[f^^^] = o(l) for all x G 
[0.3,0.6], and in that setting this will imply that whp fx = 0. Formally, we prove that: 

Proposition 7. There exists a non-empty interval {£2,£i) in which for every e G {£2^£i) o-nd F 
distributed according to Vn,m,k! = (l + e)2'^ln2, there exists x G [0.3,0.6] so that E[fx] > 1 while 
whp /x = for every x in that interval. 
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We choose the value £2 carefully (we will shortly describe how), and for that £2 we can verify 
numerically that 

Assumption 8. Let F be distributed according to Vn,m,k- Ifm/n > (1 +£2)2'^ In 2 then whp F 
has no satisfying assignments at distance xn from the planted assignment for 0.2 < x < 0.3 or 
0.6 < X < 1.0. 

Since we are only interested in demonstrating the power of this technique, we do not care in the 
context of this present paper about turning it into a rigorous claim. 

Let us now formally define the notion of maximal satisfying assignments. 

Definition 9. Given a planted instance F with a planted assignment (p, we say that a satisfying 
assignment ip' of F is maximal if every assignment ip that disagrees with both ip' and ip on some 
variable Xi does not satisfy F . 

In that sense ip' is in a maximal Hamming distance from ip. For example, if the complement of 
the planted also satisfies -F, then it is maximal (in a vacant way). It is easily proven that F has a 
satisfying assignment if and only if F has a maximal satisfying assignment. 

Let £1 be the maximal value such that for m/n = (1 + £1)2'^ In 2 and some x G [0.3, 0.6], 

E%] > 1. 

Let £2 be the minimal value such that for m/n = (1 + £2)2*^ In 2 and every x G [0.3, 0.6] 

E[fri < n-\ 

The proof of Propositions [5] and [6] show that £2 always exists, and we have verified the exis- 
tence of £1 numerically. The condition £'[/™^^] < n~'^ for x E [0.3,0.6] easily translates to the 
following claim: whp there are no maximal satisfying assignments at distance xn for x G [0.3,0.6]. 
This follows from Makrov's inequality, which gives an upper bound of on the probability that 
ymax ^ Q ^£qj, ^ fixed x). Now take the union bound over at most n possible values of x. 

Before proving Proposition [71 we still need to show that the interval (£2,ei) is not empty. 
Proposition 10. £2 < £1 

Proof. Fix X £ [0.3,0.6], and consider a random formula F from Vn,m,k- Let Mj be the event that 
Pi at distance xn from the planted assignment p is maximal, and Ai the event that pi satisfies F. 
Using this terminology: 

^[/ri= Pr[AAMi]=Y,Prm\A,]Pr[Ai]=Pr[M,\Ai]E[f,]. (5.1) 

i:5{ipi)=xn i 

In the last step we used the fact that Pr[M,|Aj] is the same for every pi by symmetry, and therefore 
we can pull it out in front of the summation. It remains to estimate Pr[Mj|74j]. Conditioning on 
the event Ai in the planted model means conditioning on the fixed assignment pi to be satisfying 
in addition to the planted assignment. In other words this means that only clauses which are 
satisfied by both pi and p can be included. By symmetry, every set of t clauses satisfied by both 
has the same probability of being included. Observe that for t = m this is exactly the definition of 
the doubly-planted distribution (the distribution where to begin with two planted assignments are 
respected). 
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A standard approach is to consider the following variation of the doubly-planted model: pick 
every clause satisfied by both ipi and ip w.p. p, where p satisfies p = m/\S\, S being the set of 
clauses which are satisfied by both (pi,^p. For the properties that interest us, it is straightforward 
to translate results between these two models. It is also easy to see that |5| > {2^ — 2)()!). 

Now consider a variable s in (pi whose assignment agrees with and w.l.o.g. assume it is 
TRUE. We call a clause C s-qualifying for ipi if it takes the form {s V iy^^ \/ iy^ \/ ■ ■ ■ \/ ^j/j.,^), where 
£y. is a FALSE literal (over the variable yj) under ipi. If (pi is maximal then at least one of the (^"j^) 
s-qualifying clauses had to be included. The probability that at least one such clause is included is 
at most 

1 _ (1 < 1 _g-WW2'=-2))^ 

Next we observe that ipi has at least (1 — x)n variables which are assigned according to 99. Also 
observe that the set of s-qualifying clauses is disjoint from the set of g-qualifying clauses. Finally, 
for ipi to be maximal there must be at least one s-qualifying clause in F for every variable s. The 
probability for that is at most 

PrmA^] < (1 - e-W(n(2'=-2))yi--> < (1 - (1 - :r)e-W{n(2'^--2))^" ^ ^n^ (5_2) 

for some a = a{k) < 1 (here we assumed that x G [0.3,0.6] and therefore (1 — x) € [0.4,0.7]). 
Combining Equations (15.11) and (15. 2p we derive 

E[f^-^]<E[f,].a\ (5.3) 

We claim that this implies £1 — £2 ^ h for some h = h[k) > (h actually depends on a, but a 
depends only on k). Fix some b = h{k) > 1 s.t. b ■ a < 1 (since a = a{k) < 1, such b exists). Since 
E[fx] is continuous and decreasing in m/n, and by the maximality of ei, we can find h = h{k) > 
s.t. E[fx] < 6" for all x G [0.3,0.6] when m/n < (l-|-ei — h)2^ In 2. On the other hand, as Equation 
(lOD implies, -E[/^^^] Kb^-al^ = {ab^ < (for sufficiently large n) for ah x e [0.3,0.6]. By the 
minimality of £2 this in particular implies that £2 < £1 — h. ■ 

Proof. (Proposition [7]) Fix some £ £ {£2,£i) and consider a random formula F in Vn,m,k so that 
m/n = (1 -|- e)2'^ln2. By the choice of e > £2, it holds that whp F has no maximal satisfying 
assignments at distance xn from the planted assignment for x £ [0.3,0.6]. Assume that indeed this 
is the case, and also assume that Assumption [8] holds. 

By the choice of e < ei and the maximality of ei, for some xi G [0.3,0.6] indeed E[fxj^] > 1. 
We shall now show that fx = for all x £ [0.3,0.6]. Assume by contradiction that fx > for 
some X G [0.3,0.6]. Namely, there exists a satisfying assignment ip at distance xn from the planted 
assignment, ip. Construct the assignment ip' in the following manner: while possible, flip the 
assignment of a variable that agrees with ip that leaves the assignment satisfying. By construction 
it is clear that ip' is maximal. The crucial observation now is that at each iteration of the process we 
increase the distance between the current assignment and the planted by exactly one. Specifically, 
we start the procedure with an assignment at distance xn for x G [0.3, 0.6], and keep increasing the 
distance. If the final distance yn is s.t. y ^ [0.3, 0.6] then at some point we've reached a satisfying 
assignments at distance > 0.6n + 1. This contradicts Assumption [HI Therefore we have that ip' , a 
maximal satisfying assignment already, is at distance yn for y G [0.3,0.6]. This however contradicts 
our assumption that no maximal satisfying assignments exist at that range. ■ 
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